Importance sampling in reinforcement learning with an estimated behavior policy

نویسندگان

چکیده

Abstract In reinforcement learning, importance sampling is a widely used method for evaluating an expectation under the distribution of data one policy when has in fact been generated by different policy. Importance requires computing likelihood ratio between action probabilities target and those data-producing behavior this article, we study where are replaced their maximum estimate these observed data. We show general technique reduces variance due to error Monte Carlo style estimators. introduce two novel estimators that use expected values arise RL literature. find reduce methods, leading faster learning gradient algorithms more accurate off-policy evaluation. also provide theoretical analysis showing our new consistent have asymptotically lower than

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Importance Sampling for Reinforcement Learning with Multiple

This thesis considers three complications that arise from applying reinforcement learning to a real-world application. In the process of using reinforcement learning to build an adaptive electronic market-maker, we find the sparsity of data, the partial observability of the domain, and the multiple objectives of the agent to cause serious problems for existing reinforcement learning algorithms....

متن کامل

Truncated Importance Sampling for Reinforcement Learning with Experience Replay

Reinforcement Learning (RL) is considered here as an adaptation technique of neural controllers of machines. The goal is to make Actor-Critic algorithms require less agent-environment interaction to obtain policies of the same quality, at the cost of additional background computations. We propose to achieve this goal in the spirit of experience replay. An estimation method of improvement direct...

متن کامل

Importance sampling for reinforcement learning with multiple objectives

متن کامل

Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling

The speed and performance of learning depend on the complexity of the learner. A simple learner with few parameters and no internal states can quickly obtain a reactive policy, but its performance is limited. A learner with many parameters and internal states may finally achieve high performance, but it may take enormous time for learning. Therefore, it is difficult to decide in advance which a...

متن کامل

Policy Learning by GA using Importance Sampling

The most difficult problem of applying GA to a policy learning is that interactions with the environment require much time to evaluate the individuals. In this paper, we propose a new approach to estimate the individual’s value using importance sampling. Importance sampling reuses the experiences obtained by some policy to estimate values of the other policies. The proposed technique cuts down ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 2021

ISSN: ['0885-6125', '1573-0565']

DOI: https://doi.org/10.1007/s10994-020-05938-9